NumPy Basics. Working with multidimensional arrays

Basic tutorials:

http://wiki.scipy.org/Tentative_NumPy_Tutorial

http://nbviewer.ipython.org/github/jrjohansson/scientific-python-lectures/blob/master/Lecture-2-Numpy.ipynb

http://scipy-lectures.github.io/intro/numpy/array_object.html



In [2]:

    
import numpy as np
from StringIO import StringIO

Loading data into numpy



In [79]:

    
!head wine_names.csv









    



Class, a, b, c, d, e, f, g, h, i, j, k, l, m
1,14.23,1.71,2.43,15.6,127,2.8,3.06,.28,2.29,5.64,1.04,3.92,1065
1,13.2,1.78,2.14,11.2,100,2.65,2.76,.26,1.28,4.38,1.05,3.4,1050
1,13.16,2.36,2.67,18.6,101,2.8,3.24,.3,2.81,5.68,1.03,3.17,1185
1,14.37,1.95,2.5,16.8,113,3.85,3.49,.24,2.18,7.8,.86,3.45,1480
1,13.24,2.59,2.87,21,118,2.8,2.69,.39,1.82,4.32,1.04,2.93,735
1,14.2,1.76,2.45,15.2,112,3.27,3.39,.34,1.97,6.75,1.05,2.85,1450
1,14.39,1.87,2.45,14.6,96,2.5,2.52,.3,1.98,5.25,1.02,3.58,1290
1,14.06,2.15,2.61,17.6,121,2.6,2.51,.31,1.25,5.05,1.06,3.58,1295
1,14.83,1.64,2.17,14,97,2.8,2.98,.29,1.98,5.2,1.08,2.85,1045

http://stackoverflow.com/questions/12336234/read-csv-file-to-numpy-array-first-row-as-strings-rest-as-float



In [71]:

    
data = np.genfromtxt("wine_names.csv", dtype=None, delimiter=',', skip_header=1)



In [3]:

    
data = np.genfromtxt("wine_names.csv", dtype=float, delimiter=',', skip_header=1)



In [4]:

    
data









    Out[4]:





array([[  1.00000000e+00,   1.42300000e+01,   1.71000000e+00, ...,
          1.04000000e+00,   3.92000000e+00,   1.06500000e+03],
       [  1.00000000e+00,   1.32000000e+01,   1.78000000e+00, ...,
          1.05000000e+00,   3.40000000e+00,   1.05000000e+03],
       [  1.00000000e+00,   1.31600000e+01,   2.36000000e+00, ...,
          1.03000000e+00,   3.17000000e+00,   1.18500000e+03],
       ..., 
       [  3.00000000e+00,   1.32700000e+01,   4.28000000e+00, ...,
          5.90000000e-01,   1.56000000e+00,   8.35000000e+02],
       [  3.00000000e+00,   1.31700000e+01,   2.59000000e+00, ...,
          6.00000000e-01,   1.62000000e+00,   8.40000000e+02],
       [  3.00000000e+00,   1.41300000e+01,   4.10000000e+00, ...,
          6.10000000e-01,   1.60000000e+00,   5.60000000e+02]])

Data is from:

https://archive.ics.uci.edu/ml/datasets/Wine

Numpy Basics

ndarray.ndim > the number of axes (dimensions) of the array. In the Python world, the number of dimensions is referred to as rank. (Commentary: This is not = dimensions (columns)??

ndarray.shape > the dimensions of the array. This is a tuple of integers indicating the size of the array in each dimension. For a matrix with n rows (Reihen) and m columns (Spalten), shape will be (n,m). The length of the shape tuple is therefore the rank, or number of dimensions, ndim.

ndarray.size > the total number of elements of the array. This is equal to the product of the elements of shape.

ndarray.dtype > an object describing the type of the elements in the array. One can create or specify dtype's using standard Python types. Additionally NumPy provides types of its own. numpy.int32, numpy.int16, and numpy.float64 are some examples.

http://docs.scipy.org/doc/numpy/reference/generated/numpy.ndarray.html#numpy.ndarray



In [73]:

    
data.shape









    Out[73]:





(178, 14)

177 rows and 14 columns, 177 datapoints with 14 dimensions



In [74]:

    
data.ndim









    Out[74]:





2

Print options



In [128]:

    
np.set_printoptions(threshold='nan')

Print everything ↑



In [75]:

    
np.set_printoptions(edgeitems=3,infstr='inf',
linewidth=75, nanstr='nan', precision=8,
suppress=False, threshold=1000, formatter=None)

Default settings ↑

Read-in a row or data point with index n



In [90]:

    
data[10]









    Out[90]:





array([  1.00000000e+00,   1.41000000e+01,   2.16000000e+00,
         2.30000000e+00,   1.80000000e+01,   1.05000000e+02,
         2.95000000e+00,   3.32000000e+00,   2.20000000e-01,
         2.38000000e+00,   5.75000000e+00,   1.25000000e+00,
         3.17000000e+00,   1.51000000e+03])

Read-in a column (dimension)



In [92]:

    
data[:,1]









    Out[92]:





array([ 14.23,  13.2 ,  13.16,  14.37,  13.24,  14.2 ,  14.39,  14.06,
        14.83,  13.86,  14.1 ,  14.12,  13.75,  14.75,  14.38,  13.63,
        14.3 ,  13.83,  14.19,  13.64,  14.06,  12.93,  13.71,  12.85,
        13.5 ,  13.05,  13.39,  13.3 ,  13.87,  14.02,  13.73,  13.58,
        13.68,  13.76,  13.51,  13.48,  13.28,  13.05,  13.07,  14.22,
        13.56,  13.41,  13.88,  13.24,  13.05,  14.21,  14.38,  13.9 ,
        14.1 ,  13.94,  13.05,  13.83,  13.82,  13.77,  13.74,  13.56,
        14.22,  13.29,  13.72,  12.37,  12.33,  12.64,  13.67,  12.37,
        12.17,  12.37,  13.11,  12.37,  13.34,  12.21,  12.29,  13.86,
        13.49,  12.99,  11.96,  11.66,  13.03,  11.84,  12.33,  12.7 ,
        12.  ,  12.72,  12.08,  13.05,  11.84,  12.67,  12.16,  11.65,
        11.64,  12.08,  12.08,  12.  ,  12.69,  12.29,  11.62,  12.47,
        11.81,  12.29,  12.37,  12.29,  12.08,  12.6 ,  12.34,  11.82,
        12.51,  12.42,  12.25,  12.72,  12.22,  11.61,  11.46,  12.52,
        11.76,  11.41,  12.08,  11.03,  11.82,  12.42,  12.77,  12.  ,
        11.45,  11.56,  12.42,  13.05,  11.87,  12.07,  12.43,  11.79,
        12.37,  12.04,  12.86,  12.88,  12.81,  12.7 ,  12.51,  12.6 ,
        12.25,  12.53,  13.49,  12.84,  12.93,  13.36,  13.52,  13.62,
        12.25,  13.16,  13.88,  12.87,  13.32,  13.08,  13.5 ,  12.79,
        13.11,  13.23,  12.58,  13.17,  13.84,  12.45,  14.34,  13.48,
        12.36,  13.69,  12.85,  12.96,  13.78,  13.73,  13.45,  12.82,
        13.58,  13.4 ,  12.2 ,  12.77,  14.16,  13.71,  13.4 ,  13.27,
        13.17,  14.13])

Read-in a value inside an array

first value (index [0, n]) of second dimension [n, 1] = second column



In [103]:

    
data[0,1]









    Out[103]:





14.23

second value (index [1, n]) of second dimension [n, 1] = second column



In [104]:

    
data[1,1]









    Out[104]:





13.199999999999999

With this "Printing" and accessing of values of one data point is manageable. The next step is to min max scale all values of one dimension to some parameters for the Synthdef (OSC-messages) such as frequency or something different.

Before that some basics about the communication between ipython and SC3 via OSC.

How to send OSC messages to SC3 with ipython?

Download pyOSC here: https://trac.v2.nl/wiki/pyOSC And do sudo ipython setup.py install inside the pyOSC folder.

On OSC more here: http://opensoundcontrol.org/introduction-osc

Source: http://www.caseyanderson.com/teaching/ipython-to-supercollider-via-osc/

Load the code below in SC3 (Supercollider): http://supercollider.sourceforge.net/

( SynthDef("grain", { |out, amp=0.1, freq=440, sustain=0.01, pan| var snd = FSinOsc.ar(freq); var amp2 = amp * AmpComp.ir(freq.max(50)) * 0.5; var env = EnvGen.ar(Env.sine(sustain, amp2), doneAction: 2); OffsetOut.ar(out, Pan2.ar(snd * env, pan)); }, \ir ! 5).add; )



In [26]:

    
import OSC
import time, random
client = OSC.OSCClient()
client.connect( ( '127.0.0.1', 57110 ) )

OSC message to send: s.sendMsg("s_new", \grain, -1, 0, 1, \freq, 200, \sustain, 0.1, \pan, -1.0);



In [66]:

    
msg = OSC.OSCMessage()
msg.setAddress("s_new")
msg.append("grain")
msg.append(-1)
msg.append(0)
msg.append(1)
msg.append("amp")
msg.append(1)
msg.append("freq")
msg.append(4000)
msg.append("sustain")
msg.append(0.1)
msg.append("pan")
msg.append(0)
client.send(msg)



In [186]:

    
import time, sys
for i in range(100):
    
    msg = OSC.OSCMessage()
    msg.setAddress("s_new")
    msg.append("grain")
    msg.append(-1)
    msg.append(0)
    msg.append(1)
    msg.append("amp")
    msg.append(1)
    msg.append("freq")
    msg.append(440+(i*10))
    msg.append("sustain")
    msg.append(0.15)
    msg.append("pan")
    msg.append(1)
    client.send(msg)
    
    msg = OSC.OSCMessage()
    msg.setAddress("s_new")
    msg.append("grain")
    msg.append(-1)
    msg.append(0)
    msg.append(1)
    msg.append("amp")
    msg.append(1)
    msg.append("freq")
    msg.append(1440+(i*10))
    msg.append("sustain")
    msg.append(0.15)
    msg.append("pan")
    msg.append(-1)
    client.send(msg)
    
    time.sleep(0.04)

Create/define a function



In [188]:

    
def oscgrain( frequency ):
    msg = OSC.OSCMessage()
    msg.setAddress("s_new")
    msg.append("grain")
    msg.append(-1)
    msg.append(0)
    msg.append(1)
    msg.append("amp")
    msg.append(1)
    msg.append("freq")
    msg.append(frequency)     #read in data points
    msg.append("sustain")
    msg.append(0.015) #0.01-0.04
    msg.append("pan")
    msg.append(-1)
    client.send(msg)



In [190]:

    
oscgrain(1000)

Loading the array



In [4]:

    
data[0,1]









    Out[4]:





14.23

Reading-out values from the array with a timed loop

reading out one data point



In [194]:

    
data[1,:]









    Out[194]:





array([  1.00000000e+00,   1.32000000e+01,   1.78000000e+00,
         2.14000000e+00,   1.12000000e+01,   1.00000000e+02,
         2.65000000e+00,   2.76000000e+00,   2.60000000e-01,
         1.28000000e+00,   4.38000000e+00,   1.05000000e+00,
         3.40000000e+00,   1.05000000e+03])



In [130]:

    
import time, sys
for i in range (14):
    print data[1,i]
    time.sleep(0.4)

Min max scaling the data

reading-out one dimension



In [153]:

    
data[:,1]









    Out[153]:





array([ 14.23,  13.2 ,  13.16,  14.37,  13.24,  14.2 ,  14.39,  14.06,
        14.83,  13.86,  14.1 ,  14.12,  13.75,  14.75,  14.38,  13.63,
        14.3 ,  13.83,  14.19,  13.64,  14.06,  12.93,  13.71,  12.85,
        13.5 ,  13.05,  13.39,  13.3 ,  13.87,  14.02,  13.73,  13.58,
        13.68,  13.76,  13.51,  13.48,  13.28,  13.05,  13.07,  14.22,
        13.56,  13.41,  13.88,  13.24,  13.05,  14.21,  14.38,  13.9 ,
        14.1 ,  13.94,  13.05,  13.83,  13.82,  13.77,  13.74,  13.56,
        14.22,  13.29,  13.72,  12.37,  12.33,  12.64,  13.67,  12.37,
        12.17,  12.37,  13.11,  12.37,  13.34,  12.21,  12.29,  13.86,
        13.49,  12.99,  11.96,  11.66,  13.03,  11.84,  12.33,  12.7 ,
        12.  ,  12.72,  12.08,  13.05,  11.84,  12.67,  12.16,  11.65,
        11.64,  12.08,  12.08,  12.  ,  12.69,  12.29,  11.62,  12.47,
        11.81,  12.29,  12.37,  12.29,  12.08,  12.6 ,  12.34,  11.82,
        12.51,  12.42,  12.25,  12.72,  12.22,  11.61,  11.46,  12.52,
        11.76,  11.41,  12.08,  11.03,  11.82,  12.42,  12.77,  12.  ,
        11.45,  11.56,  12.42,  13.05,  11.87,  12.07,  12.43,  11.79,
        12.37,  12.04,  12.86,  12.88,  12.81,  12.7 ,  12.51,  12.6 ,
        12.25,  12.53,  13.49,  12.84,  12.93,  13.36,  13.52,  13.62,
        12.25,  13.16,  13.88,  12.87,  13.32,  13.08,  13.5 ,  12.79,
        13.11,  13.23,  12.58,  13.17,  13.84,  12.45,  14.34,  13.48,
        12.36,  13.69,  12.85,  12.96,  13.78,  13.73,  13.45,  12.82,
        13.58,  13.4 ,  12.2 ,  12.77,  14.16,  13.71,  13.4 ,  13.27,
        13.17,  14.13])

Maxiumum and minimum within one dimension (column)



In [154]:

    
np.amax((data[:,1]))









    Out[154]:





14.83



In [156]:

    
np.amin((data[:,1]))









    Out[156]:





11.029999999999999



In [178]:

    
np.amax((data[:,1]))-np.amin((data[:,1]))









    Out[178]:





3.8000000000000007

reading-out all values of one dimension scaled between 0 and 1



In [14]:

    
dimension = 12
datanew = (data[:,dimension] - np.amin((data[:,dimension]))) / (np.amax((data[:,dimension]))-np.amin((data[:,dimension])))



In [15]:

    
datanew









    Out[15]:





array([ 0.97069597,  0.78021978,  0.6959707 ,  0.7985348 ,  0.60805861,
        0.57875458,  0.84615385,  0.84615385,  0.57875458,  0.83516484,
        0.6959707 ,  0.56776557,  0.5970696 ,  0.53479853,  0.63369963,
        0.58974359,  0.50549451,  0.47619048,  0.56776557,  0.76556777,
        0.89377289,  0.82417582,  1.        ,  0.86446886,  0.93406593,
        0.70695971,  0.71428571,  0.54945055,  0.78021978,  0.84981685,
        0.52747253,  0.58974359,  0.58608059,  0.63369963,  0.58608059,
        0.80586081,  0.55311355,  0.45421245,  0.52014652,  0.82783883,
        0.77289377,  0.63369963,  0.83882784,  0.63369963,  0.76190476,
        0.75457875,  0.79487179,  0.75457875,  0.54212454,  0.67032967,
        0.6007326 ,  0.76923077,  0.72893773,  0.60805861,  0.70695971,
        0.64468864,  0.74725275,  0.57509158,  0.58608059,  0.2014652 ,
        0.14652015,  0.11721612,  0.43589744,  0.58608059,  0.35164835,
        0.37728938,  0.6996337 ,  0.80952381,  0.24175824,  0.65934066,
        0.2014652 ,  0.69230769,  0.55311355,  0.81684982,  0.68131868,
        0.31868132,  0.44322344,  0.45787546,  0.38095238,  0.68131868,
        0.67765568,  0.68498168,  0.53113553,  0.27106227,  0.66300366,
        0.69230769,  0.36263736,  0.71062271,  0.54212454,  0.71062271,
        0.36630037,  0.50549451,  0.28937729,  0.74358974,  0.61904762,
        0.4981685 ,  0.36263736,  0.53846154,  0.54945055,  0.57142857,
        0.61904762,  0.54945055,  0.77289377,  0.42857143,  0.84249084,
        0.74358974,  0.6959707 ,  0.42124542,  0.64102564,  0.72893773,
        0.56410256,  0.55311355,  0.45054945,  0.38095238,  0.7032967 ,
        0.58608059,  0.75457875,  0.61904762,  0.31135531,  0.65201465,
        0.77655678,  0.88644689,  0.67765568,  0.67032967,  0.86813187,
        0.73626374,  0.57509158,  0.42857143,  0.55311355,  0.47619048,
        0.00732601,  0.05494505,  0.03296703,  0.00732601,  0.08791209,
        0.11355311,  0.        ,  0.15384615,  0.2014652 ,  0.32234432,
        0.38095238,  0.43956044,  0.28937729,  0.28571429,  0.26739927,
        0.15018315,  0.02197802,  0.21611722,  0.12820513,  0.02197802,
        0.01098901,  0.07326007,  0.02197802,  0.08791209,  0.1025641 ,
        0.07692308,  0.13553114,  0.16849817,  0.25274725,  0.18681319,
        0.11355311,  0.2014652 ,  0.30769231,  0.17582418,  0.15018315,
        0.17582418,  0.10622711,  0.17582418,  0.19413919,  0.23809524,
        0.20512821,  0.13186813,  0.16117216,  0.17216117,  0.10622711,
        0.10622711,  0.12820513,  0.12087912])

How to put together all dimensions?



In [6]:

    
data0 = (data[:,0] - np.amin((data[:,0]))) / (np.amax((data[:,0]))-np.amin((data[:,0])))
data1 = (data[:,1] - np.amin((data[:,1]))) / (np.amax((data[:,1]))-np.amin((data[:,1])))
data2 = (data[:,2] - np.amin((data[:,2]))) / (np.amax((data[:,2]))-np.amin((data[:,2])))



In [17]:

    
data_all = np.column_stack([data0, data1, data2])

http://docs.scipy.org/doc/numpy/reference/generated/numpy.column_stack.html

Use axis! http://stackoverflow.com/questions/23120621/less-code-lines-for-scaling-and-stacking-columns-in-numpy



In [5]:

    
data_all = (data - np.min(data, axis=0))/(np.max(data, axis=0) - np.min(data, axis=0))



In [15]:

    
mn, mx = data.min(0), data.max(0)
data_all = (data - mn)/(mx-mn)



In [6]:

    
data_all









    Out[6]:





array([[ 0.        ,  0.84210526,  0.1916996 , ...,  0.45528455,
         0.97069597,  0.56134094],
       [ 0.        ,  0.57105263,  0.2055336 , ...,  0.46341463,
         0.78021978,  0.55064194],
       [ 0.        ,  0.56052632,  0.3201581 , ...,  0.44715447,
         0.6959707 ,  0.64693295],
       ..., 
       [ 1.        ,  0.58947368,  0.69960474, ...,  0.08943089,
         0.10622711,  0.39728959],
       [ 1.        ,  0.56315789,  0.36561265, ...,  0.09756098,
         0.12820513,  0.40085592],
       [ 1.        ,  0.81578947,  0.66403162, ...,  0.10569106,
         0.12087912,  0.20114123]])

write a cvs file for later usage in SC3 with scaled values



In [14]:

    
data_sc3 = 16+(data_all*22450)



In [17]:

    
np.savetxt("wine_data_scaled.csv", data_sc3, delimiter=",")

Check properties of data_all



In [13]:

    
data_all.shape









    Out[13]:





(178, 14)



In [19]:

    
data_all[1,:]









    Out[19]:





array([ 0.        ,  0.57105263,  0.2055336 ,  0.4171123 ,  0.03092784,
        0.32608696,  0.57586207,  0.51054852,  0.24528302,  0.27444795,
        0.26450512,  0.46341463,  0.78021978,  0.55064194])



In [20]:

    
data[1,:]









    Out[20]:





array([  1.00000000e+00,   1.32000000e+01,   1.78000000e+00,
         2.14000000e+00,   1.12000000e+01,   1.00000000e+02,
         2.65000000e+00,   2.76000000e+00,   2.60000000e-01,
         1.28000000e+00,   4.38000000e+00,   1.05000000e+00,
         3.40000000e+00,   1.05000000e+03])



In [22]:

    
import time, sys
for i in range (14):
    print data[1,i]
    time.sleep(0.4)



In [170]:

    
import time, sys
for i in range (14):
    print data_all[1,i]
    time.sleep(0.4)









    



0.0
0.571052631579
0.205533596838
0.417112299465
0.0309278350515
0.326086956522
0.575862068966
0.510548523207
0.245283018868
0.274447949527
0.264505119454
0.463414634146
0.78021978022
0.550641940086

Use oscgrain (function defined above)

read in and play/sonify one datapoint n dimensions



In [214]:

    
import time, sys
for i in range(14):
    oscgrain(200+(1500*data_all[123,i]))
    time.sleep(0.09)

Read-in all datapoints sequentially



In [215]:

    
for i in range(178):
    for ii in range (14):
        oscgrain(200+(1500*data_all[i,ii]))
        time.sleep(0.01) # 14*0.01=0.14 lenght/period time
    time.sleep(0.2)

See the notebook "EXPLORATIONS WITH ARRAY READING USING OSCGRAIN"

Next steps

is this related to parallel coordinate visualization?

combine "read in and play/sonify one datapoint n dimensions" with brushing in scatteplot matrix?

Version information



In [74]:

    
%load_ext version_information
%version_information numpy, scipy, matplotlib, sympy, pyosc









    



The version_information extension is already loaded. To reload it, use:
  %reload_ext version_information






    Out[74]:




Software Version
Python 2.7.6 |Anaconda 1.9.1 (x86_64)| (default, Jan 10 2014, 11:23:15) [GCC 4.0.1 (Apple Inc. build 5493)]
IPython 1.2.1
OS posix [darwin]
numpy 1.8.1
scipy 0.13.3
matplotlib 1.3.1
sympy 0.7.5
pyosc 0.3.5b-5294
Thu Apr 17 09:54:28 2014 CEST



In [ ]:

Software	Version
Python	2.7.6 \|Anaconda 1.9.1 (x86_64)\| (default, Jan 10 2014, 11:23:15) [GCC 4.0.1 (Apple Inc. build 5493)]
IPython	1.2.1
OS	posix [darwin]
numpy	1.8.1
scipy	0.13.3
matplotlib	1.3.1
sympy	0.7.5
pyosc	0.3.5b-5294
Thu Apr 17 09:54:28 2014 CEST